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Abstract 

The problem of stabilizing an unstable plant over a noisy communication link is an increasingly 
important one that arises in applications of networked control systems. Although the work of Schulman 
^ and Sahai over the past two decades, and their development of the notions of "tree codes" and "anytime 

D capacity", provides the theoretical framework for studying such problems, there has been scant practical 

1^ progress in this area because explicit constructions of tree codes with efficient encoding and decoding 

did not exist. To stabilize an unstable plant driven by bounded noise over a noisy channel one needs 
real-time encoding and real-time decoding and a reliability which increases exponentially with decoding 
HH delay, which is what tree codes guarantee. We prove that linear tree codes occur with high probability 

Q and, for erasure channels, give an explicit construction with an expected decoding complexity that is 

constant per time instant. We give novel sufficient conditions on the rate and reliability required of the 
^ tree codes to stabilize vector plants and argue that they are asymptotically tight. This work takes an 

important step towards controlling plants over noisy channels, and we demonstrate the efficacy of the 
method through several examples. 

^ I. INTRODUCTION 

^ Control theory deals with regulating the behavior of dynamical systems using real-time output 

feedback. Most traditional control systems are characterized by the measurement and control 
subsystems being co-located. Hence, there were no loss of measurement and control signals in the 
feedback loop. There is a very mature theory for this setup and there are concrete theoretical tools 
to analyze the overall system performance and its robustness to modeling errors [[TJ. There are 
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increasingly many applications of networked control systems, however, where the measurement 
and control signals are communicated over noisy channels. Some examples include the smart 
grid, distributed computation, intelligent highways, etc (e.g., see H). 

Applications of networked control systems represent different levels of decentralization in their 
structure. At a high level, the measurement unit and the controller are not co-located but each 
is individually centralized. In addition, the measurement and control subsystems are themselves 
comprised of arrays of sensors and actuators that in turn communicate with each other over a 
network. Our focus is on the former. We consider the setup where the measurement and control 
subsystems are individually centralized but are separated by communicated channels. 

Several aspects of this problem have been studied in the literature [|3|-[|7|. When the com- 
munication links are modeled as rate-limited noiseless channels, significant progress has been 
made (see e.g., [[8|-[10|) in understanding the bandwidth requirements for stabilizing open loop 



unstable systems. [ 11 1 considered robust feedback stabilization over communication channels 
that are modeled as variable rate digital links where the encoder has causal knowledge of the 
number of bits transmitted error free. Under a packet erasure model, [fT2l studied the problem of 
LQG (Linear Quadratic Gaussian) control in the presence of measurement erasures and showed 
that closed loop mean squared stability is not possible if the erasure probability is higher than 
a certain threshold. So, clearly the measurement and control signals need to be encoded to 
compensate for the channel errors. 

There are two key differences between the communication paradigm for distributed control 
and that traditionally studied in information theory. Shannon's information theory, in large part, 
is concerned with reliable one-way communication while communication for control is funda- 
mentally interactive: the plant measurements to be encoded are determined by the control inputs, 
which in turn are determined by how the controller decodes the corrupted plant measurements. 
Furthermore, conventional channel codes achieve reliability at the expense of delay which, if 
present in the feedback loop of a control system, can adversely affect its performance. 



In this context, |13| provides a necessary and sufficient condition on the communication 
reliability needed over channels that are in the feedback loop of unstable scalar linear processes, 
and proposes the notion of anytime capacity as the appropriate figure of merit for such channels. 
In essence, the encoder is causal and the probability of error in decoding a source symbol that 
was transmitted d time instants ago should decay exponentially in the decoding delay d. 



Although the connection between communication reliability and control is clear, very little is 
known about error-correcting codes that can achieve such reliabilities. Prior to the work of [13], 



and in the context of distributed computation, [14| proved the existence of codes which under 
maximum likelihood decoding achieve such reliabilities and referred to them as tree codes. Note 
that any real-time error correcting code is causal and since it encodes the entire trajectory of a 



process, it has a natural tree structure to it. [ 14 1 proves the existence of nonlinear tree codes and 



gives no explicit constructions and/or efficient decoding algorithms. [15| and [14| also propose 
sequential decoding algorithms whose expected complexity per time instant is fixed but the 
probability that the decoder complexity exceeds C decays with a heavy tail as C~'^. Much more 
recently [16] proposed efficient error correcting codes for unstable systems where the state grows 
only polynomially large with time. When the state of an unstable scalar linear process is available 



at the encoder and when there is noiseless feedback of channel outputs, [ [T7| and [ |T8| develop 
encoding-decoding schemes that can stabilize such a process over the binary symmetric channel 
and the binary erasure channel respectively. But when the state is available only through noisy 
measurements or when there is no channel feedback, little is known in the way of stabilizing an 
unstable scalar linear process over a stochastic communication channel. 

The subject of error correcting codes for control is in its relative infancy, much as the subject 
of block coding was after Shannon's seminal work in [ |19| . So, a first step towards realizing 
practical encoder-decoder pairs with anytime reliabilities is to explore linear encoding schemes. 
We consider rate R = ^ causal linear codes which map a sequence of A;-dimensional binary 
vectors {fc^j^o ^ sequence of n— dimensional binary vectors {cr}'^=Q where q is only a 
function of {KY^^q. Such a code is anytime reliable if at all times t and delays d > do, 
P{bt-d\t 7^ bt~d) < ri2~^"''^ for some (3 > 0. We show that linear tree codes exist and further, 
that they exist with a high probability. For the binary erasure channel, we propose a maximum 
likelihood decoder whose average complexity of decoding is constant per each time iteration and 
for which the probability that the complexity at a given time t exceeds KC^ decays exponentially 
in C. This allows one to stabilize a partially observed unstable scalar linear process over a binary 
erasure channel and to the best of the authors' knowledge, this has not been done before. 

In Section |ll[ we present some background and motivate the need for anytime reliability with 



a simple example. In Section IV we come up with a sufficient condition for anytime reliability 
in terms of the weight distribution of the code. In Section |V| we introduce the ensemble of 



time invariant codes and use the results from Section IIV] to prove that time invariant codes with 



anytime reliability exist with a high probability. In Section VI we invoke some standard results 



from the literature on coding theory to improve the results obtained in Section |V| In Section 



VII , we present a simple decoding algorithm for the erasure channel. 



II. Background 

Owing to the duality between estimation and control, the essential complexity of stabilizing 
an unstable process over a noisy communication channel can be captured by studying the open 
loop estimation of the same process. We will motivate the kind of communication reliability 
needed for control through a simple example. 

A toy example: Consider tracking the following random walk, x^+i = \xt + Wt, where Wt 
is Bernoulli(l), i.e., or 1 with equal probability, xq = and |A| > 1. Suppose an observer 
observes xt and communicates over a noisy communication channel to an estimator. Also assume 
that the estimator knows the system model and the initial state Xq = 0. The observer clearly 
needs to communicate whether wt is or 1. Note that the observer only has causal access to 
{wi}, i.e., at any time t, the observer has access to {wq, ■ ■ ■ ,Wt-i}. Let the encoding function 
of the observer at time the ft : t-^ X^, where X is the channel input alphabet and n is the 
number of channel uses available for each step of the system evolution. One can visualize such a 
causal encoding process over a binary tree as in Fig. [1} While the information bits determine the 
path in the tree, the label on each branch denotes the symbol transmitted by the observe/encoder. 
The codeword associated to a given path in the tree is given by the concatenation of the branch 
symbols along that path. Upon receiving the channel outputs until time t, the estimator generates 
estimates {wo\t, • • • , of the noise sequence {wq, Wi, . . . , Wt^i}. Then, the estimator's 

estimate of the state, Xt+i\t, is given by 

t 

Xt+l\t = ^\t-j'Wj\t (1) 
j=0 

Suppose = P (^aTgmmj{wj\t 7^ Wj) = t — d + l), i.e., PJ^ is the probability that the position 
of the earliest erroneous Wj\t is at time j = t — d+1. The probability here is over the randomness 




Fig. 1. One can visualize any causal code on a tree. The distance property is: ||C — C'Ht^ oc d. This must be true for any 
two paths with a common root and of equal length in the tree 
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of the channel. From we can bound E|xj+i — Xt+i|t| from above as 



d<t 



i=t-d+i 



^ (|A|-1)2E^'^'*I^I 

^' ' ' d<t 



2d 



I I ^ 

Clearly, a sufficient condition for limsup^E [x^+i — to be finite is as follows 

Pdt < |A|-(2+^)'^ y d>do, t > to and 5 > 



(2) 



where do and to are constants that do no depend on t,d. 

In the context of control, it was first observed in [ 13 1 that exponential reliability of the form ([2]) 
is required to stabilize unstable plants over noisy communication channels. For a given channel, 
encoder-decoder pairs that achieve ([2]) are said to be anytime reliable. This definition will be 



made more precise in Section III In the context of distributed computation, it was observed in 



[ 14 1 that a causal code under maximum likelihood decoding over a discrete memoryless channel 
is anytime reliable provided that the code has a certain distance property which is illustrated 
in Fig. [1} Avoiding mathematical clutter, one can describe the distance property as follows. For 
any two paths with a common root and of equal length in the tree whose least common ancestor 
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Fig. 2. Causal encoding and decoding 



TABLE I 



H{.) 

For a matrix F, F 

P{F) 

For a vector x, x^*^ 

For ty, G W\ w ^ v 
log(.) 

For < x,?/ < 1, KL {x\\y) 



The binary entropy function 

The smaller root of the equation H(x) 



y 



abs(F), i.e., F 
Spectral radius of F 
The i*^ component of x 
[1, . . . , 1]^, i.e., a column with m I's 
Component- wise inequality 
Logarithm in base 2 

X 1 — X 
xlog — h (1 — x) log , i.e., KuUbeck-Leibler divergence 

y 1 

between Bernoulli(x) and Bernoulli(?/) 



is at a height d from the bottom, the Hamming distance between their codewords should be 



proportional to d. [14| referred to codes with this distance property as tree codes and showed 



that they exist. There has recently been increased interest (e.g., [20|-[22|) in studying tree codes 
for interactive communication problems. But the tree codes are, in general, non-linear and the 
existence was not with high probability. 

We will prove the existence, with high probability, of linear tree codes and exploit the linearity 
to develop an efficiently decodable anytime reliable code for the erasure channel. 



in. Problem Setup 

The notation to be used in the rest of the paper is summarized in Table |lj Consider the 
following m^.— dimensional unstable linear system with m^,— dimensional measurements. Assume 



that (F, H) is observable and (F, G) is controllable. 



xt+i = Fxt + Gut + wu yt = Hxt + vt (3) 

where p{F) > 1, ut is the m^— dimensional control input and, Wt and Vt are bounded process 
and measurement noise variables, i.e., Hw* ||oo < -y and ||t't||oo < ^ for all t. The measurements 
{yt} are made by an observer while the control inputs {ut} are applied by a remote controller 
that is connected to the observer by a noisy communication channel. We assume that the control 
input is available to the plant losslessly. We do not assume that the observer has access to either 
the channel outputs or the control inputs. As is shown to be possible, e.g., in [j9|, [13|, we do 



not use the control actions to communicate the channel outputs back to the observer through the 
plant because this could have a detrimental affect on the performance of the controller. 

Before proceeding further, a word is in order about the boundedness assumption on the noise. 
If the process and/or measurement noise have unbounded support, it is not clear how one can 
stabilize the system without additional assumptions on the channel. For example, [fTTl assumes 
feedback of channel outputs to the observer in order to stabilize an unstable process perturbed 



by Gaussian noise over an erasure channel while [23| proposes a forward side channel between 
the observer and the controller that has a positive zero error capacity. We avoid this difficulty by 
assuming that the noise has bounded support which may be a reasonable assumption to make 
in practice. 

The measurements yo-t-i will need to be quantized and encoded by the observer to provide 
protection from the noisy channel while the controller will need to decode the channel outputs 
to estimate the state xt and apply a suitable control input ut. This can be accomplished by 
employing a channel encoder at the observer and a decoder at the controller. For simplicity, we 
will assume that the channel input alphabet is binary. Suppose one time step of system evolution 
in ([3]) corresponds to n channel use^ i.e., n bits can be transmitted for each measurement 
of the system. Then, at each instant of time t, the operations performed by the observer, the 
channel encoder, the channel decoder and the controller can be described as follows. The observer 
generates a A;— bit message, bt E GW'^, that is a causal function of the measurements, i.e., it 



'in practice, the system evolution in ([3} is obtained by discretizing a continuous time differential equation. So, the interval 
of discretization could be adjusted to correspond to an integer number of channel uses, provided the channel use instances are 
close enough. 



depends only on ?/o:t- Then the channel encoder causally encodes 6o:t ^ GF'^* to generate the n 
channel inputs ct G GF". Note that the rate of the channel encoder is R = k/n. Denote the n 
channel outputs corresponding to q by zt E Z"-, where Z denotes the channel output alphabet. 
Using the channel outputs received so far, i.e., ZQ,t E 2"*, the channel decoder generates estimates 
{br\t}T<t of {br}T<t, which, in turn, the controller uses to generate the control input Uf+i. This 
is illustrated in Fig. |2j Now, define 



P,*;^ = P (mm{r : ^br} = t-d+l 



Thus, P^^ is the probability that the earliest error is d steps in the past. 

Definition 1 (Anytime reliability): Given a channel, we say that an encoder-decoder pair is 
{R, 13, do)— anytime reliable over that channel if 



Pld < V'^'"' , y t,d>do (4) 

In some cases, we write that a code is (i?, anytime reliable. This means that there exists a 
fixed do > such that the code is (i?, /3, (io)— anytime reliable. □ 



We will show in Sections VIII and IX that (i?, /?)— anytime reliability with an appropriately 
large rate, R, and exponent, /3, is a sufficient condition to stabilize ([3]) in the mean squared sense|^ 
In what follows, we will demonstrate causal linear codes which under maximum likelihood (ML) 
decoding achieve such exponential reliabilities. 

IV. Linear Anytime Codes 

As discussed earlier, a first step towards developing practical encoding and decoding schemes 
for automatic control is to study the existence of linear codes with anytime reliabihty. We will 
begin by defining a causal linear code. 

Definition 2 (Causal Linear Code): A causal linear code is a sequence of linear maps : 
GF^^ ^ GF^ and hence can be represented as 

fr{hl:r) = Grlh + G,2&2 + • • • + GrrK (5) 

where dj E GF^^^ □ 



^can be easily extended to any other norm 



TABLE n 





nt X nt leading principal minor of ]HI„ 


Ct 


{cG{0,ir*:e^,^^c = 0} 


Ct,d 


{C e Ct : Cr<t-d+l = 0, Ct-d+l 7^ 0} 




Hamming weight of c 


w,d 


|{c G Ct,d : ||c|| = w}\ 


min,d 


argmin^(A^^^^ 7^ 0) 


pe 


P (^min{r : br\t ^ br} = t - d + ij 



We denote c,- = /t(&i:t)- Note that a tree code is a more general construction where need not 
be linear. Also note that the associated code rate is R = k/n. The above encoding is equivalent 
to using a semi-infinite block lower triangular generator matrix G„ given by 

Gil 

G21 G22 



Grl Gr2 ■ ■ ■ GrT 



One can equivalently represent the code with a parity check matrix ]HI„ /j, where Gn,iM-n,R = 0. 
The parity check matrix is in general not unique but it is easy to see that one can choose EI„ r 
to be block lower triangular too. 

^ Hu 

H21 H22 



n.R 



Hrl Hr2 ■ ■ ■ Hr 



(6) 



where Hij G {0, 1}"^" and n = n{l — R). In fact, we present all our results in terms of the parity 
check matrix. Before proceeding further, some of the notation specific to coding is summarized 
in Table im 



The objective is to study the existence of causal linear codes which are (R, /?)— anytime reliable 
under maximum likelihood (ML) decoding. With reference to Fig. [T] this amounts to choosing 
the branch labels, f^(hi,r), in such a way that they satisfy the distance property, and also are 
linear functions of the input, hi,T-. Further, we are interested in characterizing the thresholds 
on the rate, R, and exponent, (3, for which such codes exist. In the interest of clarity, we will 
begin with a self-contained discussion of a weak sufficient condition on the distance distribution, 
{A^^ ^, w^i^ ^}, of a causal linear code so that it is anytime reliable under ML decoding. This 
sufficient condition is an adaptation of the distance property illustrated in Fig. [T] to the case of 
causal linear codes. In section |V[ we will demonstrate the existence of causal linear codes that 
satisfy this sufficient condition. The thresholds thus obtained will be significantly tightened in 
section 



VI by invoking some standard results from random coding literature, e.g., p4|, p5 



A. A Sufficient Condition 

Suppose the decoding instant is t and without loss of generality, assume that the all zero 
codeword is transmitted, i.e., = for r < t. We are interested in the error event where the 
earliest error in estimating hr happens dX t = t — d + 1, i.e., = for all r < t — (i + 1 
and ht-d+i\t 7^ 0. Note that this is equivalent to the ML codeword, c, satisfying Cr<t-d+i = 
and Ct-d+i 7^ 0, and ^ having full rank so that c can be uniquely mapped to a transmitted 
sequence b. Then, using a union bound, we have 



pe _ p 



(0 is decoded as c) 



< ^ P (0 is decoded as c) (7) 



Consider a memoryless binary-input output-symmetric (MBIOS) channel. Let X and Z denote 
the input and output alphabet respectively. The Bhattacharya parameter, (, for such a channel is 
defined as 

J ^yp{z\X = l)p{z\X = 0)dz if Z is continuous 
J2zez \/p(^\-^ ^ = 0) if Z is discrete valued 



Now, it is well known (e.g., see [26|) that, under ML decoding 



P (0 is decoded as c) < C"'" 



From 0, it follows that P^^ < 
e < log2(l/C), then 



.<nd KdC- If <in,d > ^nd and iV^,, < 2^- for some 



(8) 



where r/ = (1 — 2'°^2(i/C) So, an obvious sufficient condition for ]HI„ can be described in 

terms of Wmin.d ^^'^ ^ti,d follows. For some 9 < log2(l/C), we need 



; J > and V t, d> do 



iv;, < 2^- V t, d> do 



(9a) 
(9b) 



where do is a constant that is independent of d, t. This brings us to the following definition 

Definition 3 (Anytime distance and Anytime reliability): We say that a code M.^^ has (a, 6, do)— anytime 
distance, if the following hold 

1) is full rank for alH > 

2) . > and, Nl ^ < 1^"" for all t > and d>do. 



□ 



We require that ^ have full rank so that the mapping from the source bits hx,t to coded bits 
Ci:t is invertible. We summarize the preceeding discussion as the following Lemma. 

Lemma 4.1: If a code EI„ ij has (a, 6*, rfo)— anytime distance, then it is (i?, /3, (io)— anytime 
reliable under ML decoding over a channel with Bhattacharya parameter where 

/3 = a(log(l/C)-^) □ 

V. Linear Anytime Codes - Existence 
Consider causal linear codes with the following Toeplitz structure 



E2 Hi 

H-j- H-r-i ■ ■ ■ Hi 



TZ 
n,R 



The superscript TZ in denotes 'Toeplitz'. is obtained from EI„/j in ([6]) by setting 
Hij = ifj-j+i for i > j. Due to the Toeplitz structure, we have the following invariance. 



^mind = ^mind ^^'^ ^wd = d f^^" ^ min(t, t'). The code will be referred to as 
a time-invariant code. The notion of time invariance is analogous to the convolutional structure 
used to show the existence of infinite tree codes in [[T4|. This time invariance allows one to 
prove that such codes which are anytime reliable are abundant. 

Definition 4 (The ensemble TZp): The ensemble TZp of time-invariant codes, H^^, is 
obtained as follows, Hi is any fixed full rank binary matrix and for r > 2, the entries of H^. 
are chosen i.i.d according to Bemoulli(p), i.e., each entry is 1 with probability p and 
otherwise. □ 

For the ensemble TZp, we have the following result 

Theorem 5.1 (Abundance of time -invariant codes): Let p = min{p, 1 — p}. Then, for each 
i? > and 

a < H-\l~ R\og{l/{l-p))), > -log [(1 - 1] , we have 
P (e^^ has (a, 9, do) - anytime distance) > 1 - 2-^^""^°^ 

Proof: See Appendix |A] ■ 
We can now use this result to demonstrate an achievable region of rate-exponent pairs for 
a given channel, i.e., the set of rates R and exponents (3 such that one can guarantee (i?, (3) 



anytime reliability using linear codes. Note that the thresholds in Theorem 5.1 are optimal when 
p = 1/2. So, for the rest of the analysis we fix p = 1/2. To determine the values of R that will 
satisfy ([8]), note that we need 



log(l/(2i-^ - 1)) < log(l/C) ^ R<1- log(l + C) 

With this observation, we have the following Corollary. 
Corollary 5.2: For any rate R and exponent /3 such that 

i?< l-log(l + C), and 

(3 < H-\l - R) {log (^0 + log (2^-^ - 1)^ 

if is chosen from TZi, then 

P is (i?, (3, do) - anytime rehable) > 1 - 2"^^"'^°) 



□ 

Note that for BEC(e), C = ^ ^iid for BSC(e), ( = 2y^e{l — e). The constant in the exponent 



^l{ndo) in Corollary 5.2 can be computed explicitly and it decreases to zero if either the rate 
or the exponent approach their respective thresholds. Further note that almost every code in the 
ensemble is (R, /3)-anytime reliable after a large enough initial delay do. 



The thresholds in Corollary 5^ have been obtained by using a simple union bound for 



bounding the error probability in Q. As one would expect, these thresholds can be improved 
by doing a more careful analysis. It turns out that the ensemble of random causal linear codes 
bears close resemblance to random linear block codes. This allows one to borrow results from 
the random coding literature to tighten the thresholds. 

VL Improving the Thresholds 

We will examine the Toeplitz ensemble more closely and show that its delay dependent distance 
distribution is bounded above by that of the random binary linear code ensemble, which we will 
define shortly. This will enable us to significantly improve the rate, exponent thresholds of 
Section |V] that were obtained using a simple union bound. 

A. A Brief Recap of Random Coding 

For an arbitrary discrete memoryless channel, recall the following familiar definition of the 
random coding exponent, Er{R), from uW] 



Er{R) = max max [Eo (p, Q) — pR] , where (Ha) 

0<p<l Q 

1+P 

(lib) 



Eo{p,Cl) = -^og^Yl 



'^Q{x)p{z\X = x)^+p 



In <\\ lb| ), Q(.) denotes a distribution on the channel input alphabet. The ensemble of random 

K 

N 

{N-K)xN 



binary linear codes with block length and rate i? = ^ is obtained by choosing an (N — K) x N 



binary parity check matrix H, i.e., H E GF2 ~ , each of whose entries is chosen i.i.d 
Bernoulli (^). For such an ensemble, any non-zero binary word c G GF^ is a codeword with 
probability 2^^^^^^^ For a given block code, let u'min denote the minimum distance and A^^; 



^We use base-2 instead of the natural logarithm 



the number of codewords with Hamming weight w. A quick calculation shows that EiV^ = 
(^^^2~^^^~^^ and that Wmin grows like H^^{1 — R)N with a high probability. A typical code in 
this ensemble is defined to be one that has u'min ~ -^^"^(1 - R)N and N,^ ^ {^)2-^'^^-^\ A 
simple Markov inequality shows that the probability that a code from this ensemble is atypical 
is at most 2^*^*^^^ For the typical code over BSC(e), the block error probability decays as 
2-nebsc{R) where the exponent Ebsc has been characterized in [25]. As has been noted in 



1 25 1, these calculations can be easily extended to a wider class of channels. In particular, the 



class of MBIOS channels admits a particularly clean characterization. We present the following 



generalization of the result in [25 1 without proof. 

Lemma 6.1: Consider a linear code with block length A^, rate R and distance distribution 
{A^t«}^=i such that 

1) Nu, = Oifw< H-\l ~R-5) 

2) < 2-^{i-^-'5+"(i)) □ 

for some S > 0. Let the channel be a MBIOS channel with Bhattacharya parameter (. Then the 
block error probability, P^,, under ML decoding is bounded as 



where 



H-\l-R)\og} , 0<R<1-h(^) 



and 5' — )■ as 5 — 0. 



Proof: The proof is a straightforward generalization of the result in [25| 



B. The Toeplitz Ensemble 

In the causal case, fix an arbitrary decoding instant t and consider the event that the earliest 
error happens at a delay d. As seen before, the associated error probability depends on the 
relevant codebook Ct^d and its distance distribution {N^ dlwLi- Recall from Table |ll] that 

Ct,d = {c e Ct : Cr<t-d+i = 0, Ct-d+i 7^ 0} 



Due to the Toeplitz structure, we have Ct^d = Cd,d- So, we drop the subscript t in A^^^ and write it 
as A^^ ^. Note that Cd,d is determined by the matrix ^. Let c be a given rarf-dimensional binary 
word, i.e., c G GF^'*, and write c = [cf , , . . . , cj]^, where Cr E GF2 notionally corresponds 
to the n encoder output bits during the r*'* time slot. Suppose ci ^ 0, then it is easy to see that 



p {mi^^c = 0) = 2- 



nd 



Recall that n = n{l - R). 

Now observe that ENu,,d < ('^'^)2"""'. This is same as the average weight distribution of the 



random binary linear code with a block length nd and rate R. So, applying Lemma 6.1 we get 
the following result. 

Theorem 6.2: For each rate R < C and exponent (3 < E(^{R), if is chosen from TZi, 
then 



P (H^l is {R, /3, do) - anytime rehable) > 1 - 2" 



-Q{ndo) 

where C is the Shannon capacity of the channel and 



EdR) ={ ^ / , X ^ (14) 

, 1 -i/ f^j <R<C 

□ 

The problem of stabilizing unstable scalar linear systems over noisy channels in the absence 



of feedback has been considered in [13|. [13| showed the existence of (i?, anytime reliable 



codes for i? < C and /3 < E^R). The code is not linear in general and the existence was 



not with high probability. Theorem |6.2| proves linear anytime reliable codes for exponent, /3, 

marks a significant 



6.2 



up to Ei^{R). When R < I - H (^-^^, E(^{R) > E:,{R). So, Theorem 
improvement in the known thresholds for stabilizing unstable processes over noisy channels, as 
is demonstrated in Figures [3] and |4} 

VIL Decoding over the Binary Erasure Channel 

Owing to the simplicity of the erasure channel, it is possible to come up with an efficient 
way to perform maximum likelihood decoding at each time step. Consider an arbitrary decoding 
instant t, let c = [cf , . . . , cJY the transmitted codeword and let z = [zf , . . . , zJY denote the 




Rate li Bate B 

(a) Binary Erasure Channel, e = 0.15 (b) Binary Symmetric Channel, e — 0.05 

Fig. 3. Comparing the thresholds obtained from Theorem |6.2| and Theorem 5.2 in |13| 



corresponding channel outputs. Recall that denotes the nt x nt leading principal minor of 
Mn^R. Let Ze denote the erasures in z and let He denote the columns of that correspond 
to the positions of the erasures. Also, let z^, denote the unerased entries of z and let denote 
the columns of EI^/j excluding He. So, we have the following parity check condition on Ze, 
HeZe = HeZe- Sincc Ze is loiown at the decoder, s = HeZe is known. Maximum likelihood 
decoding boils down to solving the linear equation HeZe = s. Due to the lower triangular nature 
of He, unlike in the case of traditional block coding, this equation will typically not have a unique 
solution, since He will typically not have full column rank. This is alright as we are not interested 
in decoding the entire Ze correctly, we only care about decoding the earlier entries accurately. 
If Ze = [Zei^ ^6^2]^' th^ii ^e,i corrcsponds to the earlier time instants while ^£,2 corresponds to 
the latter time instants. The desired reliability requires one to recover Ze^i with an exponentially 
smaller error probability than Ze,2- Since He is lower triangular, we can write HeZe = s as 



(15) 



Let H^22 denote the orthogonal complement of iJe,22, ie-, -^^^"22-^^6,22 = 0. Then multiplying both 



He,ll 







Ze,l 




Si 


He,21 


He,22 




Ze,2 




S2 



sides of (fT5|) with diag{I, He^22), we get 



He,ll 




Si 


Ze,l — 




He,22He,2l 




He,22S2 



(16) 



If [-ff^ii {H^22He,2iYY has full column rank, then Ze,i can be recovered exactly. The decoding 
algorithm now suggests itself, i.e., find the smallest possible i/e,22 such that [H'^n {H^22He,2iYY 
has full rank and it is outlined in Algorithm [T] Note that one can equivalently describe the 



Algorithm 1 Decoder for the BEC 



1) Suppose, at time t, the earliest uncorrected error is at a delay d. Identify Ze and as 
defined above. 

2) Starting with rf' = 1, 2, . . . , rf, partition 



T iT 



1^6,1 ^e,2J 



and Hf, 



He,U 
He,21 He,22 



where Ze^2 correspond to the erased positions up to delay d' . 

HeAl 



3) Check whether the matrix 



-^e,22-^e,21 



has full column rank. 



4) If so, solve for Ze,i in the system of equations 

-f^e,22-^e,21 

5) Increment t = t + \ and continue. 





Si 


Ze,l — 


_ He^22S2 _ 



decoding algorithm in terms of the generator matrix and it will be very similar to Alg [T| 

A. Encoding and Decoding Complexity 

Consider the decoding instant t and suppose that the earliest uncorrected erasure is at time 
t — d + I. Then steps 2) and 3) in Algorithm [T] can be accomplished by just reducing into 
the appropriate row echelon form, which has complexity 0((i'^). The earliest entry in z^, is at 
time t — d + 1 implies that it was not corrected at time t — 1, the probability of which is 
P^_i < r]2^^^^'^^^\ Hence, if nothing more had to be done, the average decoding complexity 
would have been at most K J2d>o c^^S^"^'' which is bounded and is independent of t. In particular, 
the probability of the decoding complexity being Kd^ would have been at most r/2^"^'^. But, 
inorder to actually solve for Ze,i in step 4), one needs to compute the syndromes si and S2. It 
is easy to see that the complexity of this operation increases linearly in time t. This is to be 
expected since the code has infinite memory. A similar computational complexity also plagues 
the encoder, for, the encoding operation at time t is described by Ct = Gtbi + . . . + Gibt where 
{bi} denote the source bits and hence becomes progressively hard with t. 



We propose the following scheme to circumvent this problem in practice. We allow the decoder 
to periodically, say at t = £{2T) (i = 1,2...) for appropriately chosen T, provide feedback to 
the encoder on the position of the earliest uncorrected erasure which is, say at time t — d. 
The encoder can use this information to stop encoding the source bits received prior to t ~ d, 
i.e., {bi} for i < t — d — 1 starting from time t + T. In other words, for r > t + T, Cr = 
Gr-t+d+2bt-d-i + . . . + Gi&r- Thc dccodcr accordingly uses the new generator matrix starting 
from t + T. In practice, this translates to an arrangement where the decoder sends feedback at 
time t and can be sure that the encoder receives it by time t + T. Such feedback, in the form 
of acknowledgements from the receiver to the transmitter, is common to most packet-based 
modern communication and networked systems for reasonable values of T. Note that this form 
of feedback finds a middle ground between one extreme of having no feedback at all and another 
extreme where every channel output is fed back to the transmitter, the latter being impractical 
in most cases. The decoder proposed in Alg. [T] is easy to implement and its performance is 
simulated in Section |Xll 



B. Extension to Packet Erasures 

The encoding and decoding algorithms presented so far have been developed for the case of 
bit erasures. But it is not difficult to see that the techniques generalize to the case of packet 
erasures. For example, for a packet length L, what was one bit earlier will now be a block of L 
bits. Each binary entry in the encoding/parity check matrix will now be an L x L binary matrix. 
The rate will remain the same. So, at each time, k packets each of length L will be encoded 
to n packets each of the same length L. Recall that the anytime performance of the code is 
determined by the delay dependent codebook Ct^d and its distance distribution {A^^dlJ^li- In 
the case of packet erasures, one can obtain analogous results by defining the Hamming distance 
of a codeword slightly differently. By viewing a codeword as a collection of packets, define its 
Hamming distance to be the number of non zero packets. The definition of the delay dependent 
distance distribution {N^^ ^} will change accordingly. With this modification, one can easily 



apply the results developed in Sections |IV| |V] and the decoding algrithm in Section |VII| above 
to the case of packet erasures. 



VIII. Sufficient Conditions for Stabilizability - Scalar Measurements 

Recall that we do not assume any feedback about the channel outputs or the control inputs at 
the observer/encoder. This is the setup we imply whenever we say that no feedback is assumed. 
In this context [13] derives a sufficient condition for stabilizing scalar linear systems over 



noisy channels without feedback while [27| considers stabilizing vector valued processes in 
the presence of feedback. So, to the best of our knowledge, there are no results on stabilizing 
unstable vector valued processes over a noisy channel when the observer does not have access 
to either the control inputs or the channel outputs. 

We will develop two sufficient conditions for stabilizing vector valued processes over noisy 
channels without feedback. The two sufficient conditions are based on two different estimation 
algorithms employed by the controller and neither is stronger than the other. We will then show 



in Section X-A that both sufficient conditions are asymptotically tight. For ease of presentation. 



we will treat the case of scalar and vector measurements separately. We will present the sufficient 
conditions for the case of scalar measurements here while vector measurements will be treated 
in Section |K] 

Consider the unstable m^.— dimensional linear state space model in (|3]) with scalar measure- 
ments, i.e., p{F) > 1, and niy = 1. Suppose that the characteristic polynomial of F is given 
by 



+ aiz 



+ ... + ar. 



Without loss of generality we assume that (F, H) are in the following canonical form. 



— ai 
-a2 

— O-m-l 
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1 





H=[1,0,...,0] 



Owing to the duality between estimation and control, we can focus on the problem of tracking 
Q over a noisy communication channel. For, if ([3]) can be tracked with an asymptotically 
finite mean squared error and if (F, G) is stabilizable, then it is a simple exercise to see that 
there exists a control law {ut} that will stabilize the plant in the mean squared sense, i.e.. 



limsup^E||xt|p < oo. In particular, if the control gain K is chosen such that F + GK is stable, 
then ut = Kxt\t will stabilize the plant, where is the estimate of xt using channel outputs 
up to time t. In control parlance, this amounts to verifying that the control input does not have 
a dual ejfect [ |28J . Hence, in the rest of the analysis, we will focus on tracking Q. The control 
input Ut therefore is assumed to be absent, i.e., ut = 0. 



A. Hypercuboidal Filter 

We bound the set of all possible states that are consistent with the estimates of the quantized 
measurements using a hypercuboid, i.e., a region of the form {x G ]R™^|a <x< b}, where 
a, b G W^"" and the inequalities are component- wise. 

Since we assume that the initial state xq has bounded support, we can write Xmin,o\-i < 
Xmaxfl\-i and suppose using the channel ouputs received till time t — 1, we have Xmin,t\t-i < 2;* < 
Xmax,t\t-i- Since H = [1,0, .. . ,0], the measurement update provides information of the form 
^mln t\t — ^t^^ — -^mL t\t wMlc thcrc wiU be no additional information on other components of Xt. 
Note that an estimate of the state is given by the mid point of this region, i.e., Xt\t = 0-5{xmin,t\t + 
Xmax,t\t)- If we define A^i^ = Xmax,t\t—Xmin,t\t, then the estimation error is asymptotically bounded 
if every component of A^i^ is asymptotically bounded. Using such a filter, we can stabilize the 
system in the mean squared sense over a noisy channel provided that the rate R and exponent 
(3 of the (i?, /?)— anytime reliable code used to encode the measurements satisfy the following 
sufficient condition 

Theorem 8.1: It is possible to stabihze ([3]) in the mean squared sense with an (i?, anytime 
code provided 

2 

R>Rn = -\og^y2\a,l /3>/3„ = -log2p(F) (17) 

1=1 

Proof: See Appendix |B] ■ 
Before proceeding further, we will provide a brief sketch of the proof. Note that At\t = 



Xmax,t\t — Xmin,t\t is a mcasurc of the uncertainty in the state estimate. From Lemma A. 2 A^+iit = 
F/\t\t + Wlm^. The anytime exponent is determined by the growth of At in the absence of 
measurements, hence the bound /3„ = 21og2p(F). The bound on the rate is determined by how 
fine the quantization needs to be for Aj to be bounded asymptotically. It will be shown in Section 



|E]that p (F) is always larger than p (F). By using an alternate filtering algorithm, which we call 
the Ellipsoidal filter, one can improve this requirement on the exponent from /3„ > 2 log2 p{F) 
to /3n > 2 log2 p{F). But this will come at the price of a larger rate. 



B. Ellipsoidal Filter 

One can alternately bound the set of all possible states that are consistent with the estimates 
of the quantized measurements using an ellipsoid 

S{P,c) = {a; eM'"-|(x-c,p-^(x-c)) < l} 

This can be seen as an extension of the technique proposed in f29) to filtering using quantized 
measurements. If rrix = 1, p{F) = p{F). So, let nix > 2. 

Let Xq G £^(Po)0) and suppose using the channel outputs received till time t — 1, we have 
Xt E £{Pt\t-i, Xt\t-i)- Since H = [1,0,..., 0], the measurement update provides information 
of the form x^^.^^^^ < x[^^ < x^^^^^^^^, which one may call a slab. S{Pt\t,Xt\t) would then be 
an ellipsoid that contains the intersection of the above slab with £(P^t-i, Xt\t-i), in particular 
one can set it to be the minimum volume ellipsoid covering this intersection. Lemma A.4| gives 



a formula for the minimum volume ellipsoid covering the intersection of an ellipsoid and a 
slab. For the time update, it is easy to see that for any e' > and P^+i = (1 + e')FPt\tF^ + 
^Im^l^^, £{Pt+i, Fxt\t) contains the state Xt+i whenever £{Pt\t, Xt\t) contains Xj. This leads to 
the following Lemma, the proof of which is contained in the discussion above. For convenience, 
we write Pt for Pt\t-i- 

Lemma 8.2 (The Ellipsoidal Filter): Whenever S{Po,0) contains xq, for each e' > 0, the 
following filtering equations give a sequence of ellipsoids [£(P^t, Xt\t)} that, at each time t, 
contain Xt- 

Pt+i = (1 + e')FPt\tF^ + —1™., xt+i = Fxt\t (18a) 
Pt\t = btPt - [bt - atj^rB ' ^At = ^t^=== (18b) 



where a^, ht and can be calculated in closed form using Lemma A. 4 and ei is the m^.— dimensional 
unit vector ei = [1, 0, . . . , 0]^. 

Using this approach, we get the following sufficient condition. 



Theorem 8.3: It is possible to stabilize Q for > 2 in the mean squared sense with an 
(i?, anytime code provided 



R > Re,n 



n 

2 

n 



log2 



i=l 



^Og2 p{F) 



(19a) 
(19b) 



where 6 = ./^^ 

Proof: See Appendix |D] ■ 

IX. Sufficient Conditions for Stabilizability - Vector Measurements 

Like in the scalar case, we will assume without loss of generality that (F, H) are in a canonical 
form (is obtained from a simple transformation of Scheme I in Sec 6.4.6 of pO} ) with the 
following structure. F is a g x g block lower triangular matrix with F^'^ denoting the {i,jY^ 



block. So, F^'J = if j > z. is an 
F*'* have the following structure. 



— Cti.L 



matrix and Yl'i=i^i = "^a;- The diagonal blocks 



1 





1 



1 




while the off-diagonal blocks do not have any specific structure. The measurement matrix H is 
of the form H = [Hf, -ffj]"^ where Hi is a q x matrix of the following form 



Hi = block diag {[1 ■ ■ ■ 0] , 1 x £i, 



(20) 



H2 does not have any particular structure and is not relevant. Note that the characteristic 
polynomial of F, is given by f(z) = YU=i {^^' + <^i,iz^'~^ + . . . + aj/J. 



If the Hypercuboidal filter is used, then Theorem 8.1 can be extended to the case of vector 
measurements is as follows. 

Theorem 9.1: It is possible to stabilize ([3]) in the mean squared sense with an (R, /3)— anytime 



code provided 

R> R 



v,n = - ^ max <^ 0, log \aij\ \ , (3 > (3^^n = - loga p (F) (21a) 
1=1 L j=i ) 



Proof: See Appendix B2 



The thresholds if one uses an Ellipsoidal filter are given as follows. 

Theorem 9.2: It is possible to stabilize ([3]) in the mean squared sense with an (i?, /?)— anytime 
code provided 

q 



1 

R > Rye,n = - ^ max <^ 0, log 



n 



'm. 



/3>/3.e,n = -log2P(F) (22a) 
n 



where 6 = ./^^ □ 



We skip the proof for Theorem 9.2 since it is very similar to that of Theorem 9.1 



X. Discussion - Asymptotics and the Stabilizable Region 

The sufficient conditions derived above are non- asymptotic in the sense that measurements 
are encoded every time step. Alternately, one can encode the measurements every, say, i time 
steps, and consider the asymptotic rate and exponent needed as i grows. This is often the form in 
which such sufficient conditions appear in the literature [[8j, pO| , p3| . Even though the sufficient 



conditions in Sections [VIII| and |IX| are non-asymptotic, note that they depend only on the system 
matrices F, H and not on the noise distribution. In order to compare our results with those in 
the literature, we examine the sufficient conditions in the asymptotic limit of large i. 

A. The Limiting Case 

Note that encoding once every i measurements amounts to working with the system matrix 
F^. So, one can calculate this limiting rate and exponent by writing the eigen values of F, 
{Xi}^i, as Aj = /i" and letting n scale. The following asymptotic result allows us to compare 
the sufficient conditions above with those in the literature (eg., see [|8|, pO| , [13|). 



Theorem 10.1 (The Limiting Case): Write the eigen values of F, {Aj}™fp in the form Aj = /i". 
Letting n scale, i?„,„, i?e,n, Rev,n converge to R*, and /3e,„, l3ev,n converge to /?*, 



where 



R* = logsl/iil, /3* = 21og2max|/ii| (23) 

i:\tii\>l 

Proof: See Appendix |Ej ■ 

For stabilizing plants over deterministic rate limited channels, [Sj showed that a rate R > R*, 



where R* is as in (23), is necessary and sufficient. So, asymptotically the sufficient condition 



for the rate R in Theorem 8.1 is tight. But it is not clear if one do with an exponent smaller 
than (3* = 21og2maxj |yUj| asymptotically when there is no feedback. Though the above limiting 
case allows one to obtain a tight and an intuitively pleasing characterization of the rate and 
exponent needed, it should be noted that this may not be operationally practical. For, if one 



encodes the measurements every i time steps, even though Theorem |10.1| guarantees stability, 
the performance of the closed loop system (the LQR cost, say) may be unacceptably large 
because of the delay we incur. This is what motivated us to present the sufficient condition in 
the form that we did above. 



B. A Comment on the Trade-off Between Rate and Exponent 

Once a set of rate-exponent pairs (i?, (5) that can stabilize a plant is available, one would want 
to identify the pair that optimizes a given cost function. Higher rates provide finer resolution of 
the measurements while larger exponents ensure that the controller's estimate of the plant does 
not drift away; however, we cannot have both. One can either coarsely quantize the measurements 
and protect the bits heavily or quantize them moderately finely and not protect the bits as much. 
One can easily cook up examples using an LQR cost function with the balance going either 
way. Studying this trade-off is integral to making the results practically applicable. 



C. Stabilizable Region 



Using the thresholds obtained in Theorem 6.2 and the asymptotic sufficient condition in 



Theorem 10.1, we can discuss the range of the eigen values of F, i.e., {I/Ujl}^"^, for which the 
rf^ moment of Xt in ([3]) can be stabilized over some common channels. Since we are interested 



in the asymptotics, we assume the same limiting case as in Section X-A Firstly, consider the 
scalar case, i.e., m^; = 1 and let the eigen value be /x. An anytime reliable code with rate R and 




exponent (3 can stabilize the process in (|3]) for all /i such that 

/5' 



logo \u\ < niin < R. 

I V 

So, a scalar unstable linear process in (|3]) can be stabilized over a MBIOS channel with Bhat- 
tacharya parameter ( provided 

log2 < logs I/Xmaxl = sup minji?, -1 (24) 

R<C,I3<E(^{R) I V ) 



The stabilizable region as implied by the threshold in [13| is given by 

/3 



log2 < log2 |/imax| = sup min <^ R, 

R<C,l3<Er{R) I V 

For 7] = 2, the stabilizable region for the EEC and BSC is shown in Fig |4] where |/imaa;| is 
plotted against the channel parameter. Consider a vector valued process with unstable eigen 
values Such a process can be stabilized by a rate R and exponent (3 anytime reliable 

code provided R > Xll^i log 1/^*1 1^ > log (niax, So, given a channel with Bhattacharya 
parameter for which the rate exponent curve {R, i?^(_R)) is achievable, the region of unstable 
eigen values that can be stabilized is given by {/i G M™, | 3R < C 3 Xli^i log < 
R and log (maxj < E^(R)}, where C is the Shannon capacity of the channel. For 
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(a) Each curve represents the outer boundary of the (b) Stabihzable region with and without feedback 
stabilizable region. 



Fig. 5. Comparing the stabilizable region of different channels 



example, let m = 2 and rj = 2. Fig |5^ shows the region of (|/ii|, |yU2|) that can be stabilized 
over three different channels, a binary symmetric channel with bit flip probability 0.1 and binary 
erasure channels with erasure probabilities 0.1 and 0.2 respectively. 

We will now compare these results with the case when there is perfect feedback of the channel 
outputs at the observer/encoder. f27] considered a priority queuing method for stabilizing vector 
valued unstable processes over channels with perfect feedback. Bits from different unstable 
subsystems are placed in a FIFO queue. Bits are given preference in decreasing order of the 
size of the eigen value of the corresponding subsystem. So, bits coming from a subsystem with 
a larger eigen value are given preference over those from a subsystem with a smaller eigen 
value. A bit is removed from the queue once it is received correctly. Since the feedback anytime 



capacity of a binary erasure channel is known pT] |, one can use Theorem 6.1 in p7| to derive 
the region of eigen values that can be stabilized by such a scheme. In Fig. |5j5, we compare 
the region of (|/ii|, |/i2|) that can be stabilized with and without feedback over a binary erasure 
channel with erasure probability 0.2. As one would expect, the region is much larger when there 
is feedback. Note that the stabilizable regions in Fig. [5] are only achievable and not necessarily 
tight. 



XL Simulations 

We present two examples and stabilize them over a binary erasure channel with erasure 
probability e = 0.3. The number of channel uses per measurement is fixed to n = 15. In both 
cases, time invariant codes His /j G TZi, for an appropriate rate R, were randomly generated 
and decoded using Algorithm [1} The controller uses the Hypercuboidal filter to estimate the 
state. 



A. Cart-Stick Balancer 



The system parameters for a cart- stick balancer (also commonly called the inverted pendulum 
on a cart) with state variables of stick angle, stick angular velocity, and cart velocity, when 



sampled with sampling duration 0.1s are (Exercise 10.15 in [32|) 



1.161 0.105 
3.3 1.161 0.002 
-3.265 -0.160 0.979 



G = [-0.003 - 0.068 0.859]^, H = [10 0] 



The characteristic polynomial of F is x^ — 3. 3x^ + 3. 27x — 0.98 and its eigen values are 1.75, 0.98 
and 0.57. So, F is open loop unstable. Each component of the process noise and measurement 
noise is i.i.d zero mean Gaussian with variance 0.01 truncated to lie in [-0.025,0.025]. The 
control input is given by Ut = —Kxt\t, where K = [—81.55 — 14.37 — 0.04]. One can verify 



that F — GK is stable. In order to apply Theorem |8.1[ we write F in the following canonical 
form 



Fo 



3.3 1 
-3.27 1 
0.98 



Applying Theorem |8.1[ one can stabilize Xt in the mean squared sense provided the exponent 
n/3 > 2 log (p (Fo)) = 4.1035 and the rate nR = k > log (3.3 + 3.27 + 0.98) = 2.1. For k = 5, 
there exist anytime reliable codes with exponent upto n(3 = 4.27. Fig |6] plots a sample path of 
the above system for a randomly chosen Toeplitz code. It is clear from Fig |6(b)| that the plant 
is stabilized. 





(a) The stick does not deviate by more than 3 degrees from the 
vertical 

Fig. 6. A sample path 



(b) This shows that the plant is stabilized 



B. Example 2 

This example is aimed at exploring the trade off between the resolution of the quantizer and 
the error performance of the causal code. Consider a 3-dimensional unstable system (|3]) with 

2 10 
0.25 1 
-0.5 

G = I3 and H = [100]. Each component of Wt and Vt is generated i.i.d A^(0, 1) and truncated 
to [-2.5,2.5]. The eigen values of F are {2,-0.5,0.5} while X(F) = 2.215. The observer 
has access to the control inputs and we use the hypercuboidal filter outlined in Section |Bj 



Using Theorem 8.1, the minimum required bits and exponent are given by A; = nR > 2 and 
n/3 > 21og2 2.215 = 2.29. The control input is ut = -Xt|t_i. For k < 7, n/S > 2.32. If /c = 8, 
n(3 = 1.32 < 2.29. For each value of k ranging from 3 to 7, 1000 codes were generated from 
the ensemble TZi . For each code, the system was simulated over a horizon of 100 time instants 
and the LQR cost has been averaged over 100 such runs. For a time horizon T, the LQR cost 

the cumulative distribution function of the 



\ut\\ ). In Fig 7(a) 



is defined as ^ Z]t=o ^ (11^^ 11^ 
LQR cost is plotted for 3 < A; < 7. The x— axis denotes the proportion of codes for which the 
LQR cost is below a prescribed value, e.g., with k = 6,n = 15, the cost was less than 15 for 



85% of the codes while with k 



5, n 



15, this fraction increases to more than 95%. The 




(a) The CDF of the LQR costs for different values of the rate (b) The LQR cost averaged over the 1000 

randomly generated codes is plotted against k 



Fig. 7. The best choice of the rate is i? = 5/15 = 0.33 



competition between the rate and the exponent in determining the LQR cost is evident when 



we look at Fig 7(b) When k = 3, the error exponent n/3 = 6.3 is large. So, at any time t, the 
decoder decodes all the source bits {br}T<t-i with a high probability. Hence, the limiting factor 
on the LQR cost is the resolution that the source bits bt provide on the measurements. But when 
k = 7, the measurements are quantized to a high resolution but the decoder makes errors in 
decoding the source bits. So, the best choice appears to be A; = 5. 
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Appendix 



A. Proof of Theorem 5. 1 



We will begin with some preliminary observations. 

Lemma A.l ( [33]): Let V be an m— dimensional vector space over GF2 and define a prob- 
ability function over V such that, for each v E V, P{v) = ^"^"(l — If JJ is an 
dimensional subspace of V, then 



m-e 



P{U) < max(p, 1 ~ p) 

Proof: Suppose p < 1/2. The proof for the other case is analogous. Let E be the set of unit 
vectors, i.e., E = {v E V \ \\v\\ = 1}. Then there is a subset, E', of E with m — i unit vectors 
such that V = U ® span{E') and U n span{E') = {0}. Let u' E span{E'), then 

(\ ll"'ll / \ 

uc^y UCL. ^ 

Note that for distinct u[,U2 E span{E'), {U + u[) 0(17 + u'^) = 0. Also note that <m-£ 
W u' E span{E'). 

i = Piv) = pl u {u+u')]> ^(^)(r^)"'" 

\u' espan{E') J u'&span{E') ^ 



Observe that there are exactly (J^. ^) vectors in span{E') with Hamming weight i. So, we have 

p \ I 1 



i=0 



m — 
i 



1 — p 



P{U) 



1 — p 



m—£ 



This completes the proof. ■ 
Remark 1: The Toeplitz parity check matrix is full rank if and only if Hi is full rank. 
This is why we fix Hi to be a full rank matrix in the definition of the Toeplitz ensemble. 

Recall that we choose the entries of Hi to be i.i.d Bernoulli(p) for i > 2. Also suppose p < 1/2. 
The results for p > 1/2 are obtained by replacing p with 1—p in the subsequent analysis. Consider 
an arbitrary decoding instant, t. Since u^mind 
drop these superscripts and write 



min,d 



^min,d and N'^^a 
n4 and N' 



NI^^ for all t,t', we will 
Nu,4- Let c = [cl,...,cJY, 
where q G {0, 1}", be a fixed binary word such that c^^t-d+i = and c-t^^+i 7^ 0. Also, let 
EI„ R be drawn from the ensemble TZp and let ^ denote the fit x nt principal minor of 
EIn,ij. We examine the probability that c is a codeword of H^^^, i.e., P (H^^^c = O). Now, smce 
CT<t-d+i = 0, j:jC = is equivalent to 



(25) 



Hi 


" 




Ct-d+l 




' " 


H2 


Hi . . . 











Hd 


Hd-i ■ ■ ■ Hi 




Ct 








Note that ( 25 1 can be equivalently written as follows 



Ct-d+i 







' hi ' 




' " 


Ct-d+2 


Ct-d+i 




h2 







Ct 


Ct-1 ■ ■ ■ Ct-d+i 




hd 








(26) 



where hi = \qc{HJ), i.e., /ij is a nri x 1 column obtained by stacking the columns of Hj one 
below the other, and Ci E {0, is obtained from q as follows. 



a 



ci ... 

cj 
... 



(27) 



Since Hi is fixed, we will rewrite (26) as 



Ct-d+i 
Ct-d+2 Ct-d+1 



t-i 



t-2 



a- 









Ct-d+2 




CO 




Ct-d+3 











hi, Ct-d+ihi — 



(28) 



=h 

Since Ct-d+i 7^ 0, Ct_d+i has full rank n and consequently C has full rank {d — l)n. Since C 
is an (d — l)n x (d — l)nn matrix, its null space has dimension (d — l)nn — {d — l)n. For 
( |28| ) to hold, h must lie in an {d — l)nn — (d — l)n dimensional flat which is contained in an 



{d — l)nn — {d — l)n + 1 dimensional subspace. Using Lemma A.l we have 

p{ml^c = 0) < {1 - pf^'-'^-' 

=^ P {w^:n,d < and) < (1 (t) 

w'<and ^ 

< (1 — p)"('^~i)~i2"'^''^''"'' 

where ?7 = (1 — p)^"^^ Similarly, 



(29) 



(30) 



^nd 



< r]2 



w 



(1-p) 



nd 



< 



-nd{ew /nd- H{w/nd)+{l~B.) log2(l/(l-p))) 



(31) 



For convenience, define 

6i = (l-R)log,(l/(l-p))-H(a) 

We need to choose 9 such that 62,w > 5 > for all a < ^ < 1. Now, define 

r = ma.^M^(l^ (32) 

x>a X 

Then for each 9 > 9*, there is a 5 > such that 52,w > S for all and < w < nd. A simple 
calculation gives 9* = log2 ( 2i-fl_i ) • For such a choice of 6* > 9*, continuing from pT] ), we 
have 

P{3and<w <nd 3 N^^^ > 2^'") < nd2-'"^^ (33) 



for some 5' > 0. For some fixed rfo large enough, applying a union bound over d > d^, to ([SO]) 



and ( [33] ), we get 

P{3d>do 3 < a^^c/ or A^^^^ > 2^"") < 2-^("'^°) (34) 



B. Proofs of Theorems 8. 1 and 9. 1 



1) Proof of Theorem 8.1- The analysis will proceed in two steps. We will first determine a 
sufficient condition on the number of bits per measurement, uR, that are required to track ([3]) 
when these bits are available error free. We will then determine the anytime exponent n(5 needed 
in decoding these source bits when they are communicated over a noisy channel. 

Let ^t\T — Xmax,t\T — Xrnin,t\T be the uncertainty in Xt using {6^/}^/<^, i.e., quantized measure- 
ments up to time r. For convenience, let At = /^t\t-i- Then, the time update is given by the 
following Lemma. 

Lemma A.2 (Time Update): The time update relating A^+i and At\t is given by A^+i = 



Proof: From the system dynamics in Q, the following is immediate 



A 



t+l 



W + max 



±a,Aj) 



A{i+1) 



a(*+1) 



a^Aj,^ 



(1) 



W 



+ W, i <m — 1 



In short, the above equations amount to At+i = FAt\t + Wlm^. ■ 
Towards the measurement update, the observer simply quantizes the measurements yt ac- 
cording to a 2"^— regular lattice quantizer with bin width 5, i.e., the quantizer is defined by 
Q : M {0, 1, . . . , 2"^ — 1}, where Q{x) = [|J mod 2"^. In order for this to work, we need 



52"" > a[^^ for any time t. Assuming that the rate, R, is large enough, we will first find the 
steady state value of the recursion for A^, which we then use to determine R. At each time 
t, the observer can communicate the measurement yt to within an uncertainty of 5, i.e., the 
estimator knows that the measurement lies in an interval of width 6. Adding to this the effect 
of the observation noise, — |- < ft < |-, the estimator knows x[^^ to within an uncertainty of 

Af^ for i ^ 1. Combining this observation with Lemma 



A^^l = 6 + V. Note that A^;] 



A.2 



It 



is straightforward to see that A^ converges, to say Aj„, in exactly nix time steps, i.e., A^ = A^ 
for all t > nix. The subscript 'tu' in Am denotes 'time update'. The following result is now 
immediate. 

Lemma A.3 (Steady State value of At): Atu = {S+V)Lua+WLulma:^ where a = [\ai\, . . . , |a„ 
and Lu = [iij]i<i,j<m with % = 

Now, we need to go back and calculate R. So we just need 52"-^ > max | Ag^^ , A^ , . . . , Ami | • 
Further, a simple calculation gives lim^^oo — |— = + • . • + The minimum rate is thus 
given by ^ log2 and this completes the proof Theorem 



8.1 



2) Proof of Theorem 9.1 ■ The proof is very similar to that of Theorem 8.1 The observations 
are quantized as follows. At any time, for I < i < q, the i^'^ component of the measurement vector 
is quantized using a 2""'— regular lattice quantizer with bin width Si. The remaining components 
of the measurement vector are ignored. The overall rate, R, is then given by i? = R1+R2 ■ ■ ■+Rq. 
The time update again is given by At^i = FAt\t + Wlm^. The limiting values of {Ri}f^^ are 
obtained by letting 5i — 00 and 00. An argument similar to the one in the previous 



section gives the following threshold, i?j > ^ max {0, log (|aj i| 



C. The Minimum Volume Ellipsoid 
Lemma A.4 (Theorem 6.1 [34^]): The minimum volume ellipsoid £(P,c) covering 

ja; e M^la; G S{P, 0),^Vh^Ph < {h, x) < SVh^Phj 

where \5\ > I7I, is given by 

Phh^P Ph 
P = bP-{b-a)^^^^, c = ^-^ (35) 
h^Ph WWPh 

where 

1) If 75 < then ^ = 0, a = 6= l 

2) If 7 + 5 = and 75 > then 

2 , mil -5^) 



^ = 0, a = mS , b 

m — 1 

3) If 7 + 5 ^ and 7(5 > then 

m(7 + 5)2 + 2(l + 75) - 



2(m + l)(7 + 5) 



a = - 7)(5 - 0, & = ".^^^ X2 
where D = m^{5'^ - 7^)2 + 4(1 - 7^)(1 - 5^) 

If \6\ < I7I, change x to —x and apply the above result. And it is easy to verify that P is indeed 
positive semidefinite. Also, a quick calculation shows that 7 < ^ ^- This confirms the intuition 
that the center of the minimum volume ellipsoid lies within the slab. 



D. Proof of Theorem 8.3 



The proof is in the same spirit as that of Theorem 8.1 We will first determine a sufficient 
condition on the number of bits per measurement, nR, that are required to track ([3]) when these 
bits are available error free. We will then determine the anytime exponent nf3 needed in decoding 
these source bits when they are communicated over a noisy channel. 

Consider the time update in (|18a[). Let P/-' denote the {i,jy^ element of Pt, then the time 



update implies 



^1 = (1 + e') (alPlt + Pl^'^' - ' - a.^^'' ) + — , 1 < . < - 1 (36a) 



4e' 



PZr- = (1 + e')«L^t" + ^ 



(36b) 



Since the matrix Pt\t is positive semidefinite, we have -P^f^^ 



P^^^ and (i^f ^ 1 < 



pll pi+l,i+l 



Using this in (36a), for 1 < i < — 1, we get 



piu<ii + e')(\a.\jps + jp:;'''^' 



4e' 



(37) 



This prompts us to bound the recursion (18) by bounding the diagonal elements of Pf Now, 



considering the measurement update ( 18b), it is easy to see that 



PS = 

atPi' <Pl< btPt 



(38a) 
(38b) 



We will first show that bt < 
Lemma A.5: bt < 



rrix — l ' 



m^ — 1 



Proof: To prove this, consider the setup of Lemma A. 4 and suppose \6\ > \^\. Then, in 
cases 1) and 2), it is clear that b < since \5\, < 1. In case 3), we have 



1-7^ 



1 - - 7) V« 



1 - 



€-7 



< 



1 



g-7 



It suffices to show that ^ — ^ < 5 — ^. This easily follows from the formulae in case 3). The 
proof for the case when \5\ < is obtained by replacing ^ with — ^. ■ 
Like in Section |b| the observer quantizes the measurements yt according to a 2"^— regular 
lattice quantizer with bin width 5. In order for the controller to know yt to within a resolution 



of 5, it is not hard to see that one needs 52"^ > 2a/P/^ + v. We begin by assuming that the 
rate R is large enough to provide the same resolution 6 on yt at each time t. The actual rate 



required to accomplish this will be calculated determining an asymptotic upper bound on 



11 



So, at time t, the controller knows that yt to within a resolution 6 and hence to within a 
resolution of 6 + V. Suppose y^-ft < ^ < ^/W^t, where ^/Wi^t -It) <5 + V. Then 



using Lemma |A.4| and noting that 7t < < 5t, we have 



at = m^i^t - 7i)(5t - ^t) <'-^{St- 7i)^ 



Using this in (38a), we get 



(39) 



Combining Lemma A. 5 and (39), we get 



/ptt < 

*l* - \/ - 1 



/p;\ z^i 



(40a) 
(40b) 



In the following Lemma, we will develop an upper bound on the diagonal elements of Pt 
which will help us determine an upper bound on P/^. 

Lemma A.6: Let Ag o £ 1^™^ be such that Ag*Q = Pq for 1 < i < rrix and suppose its 
evolution is governed by 



+1 



e,t\t 



;i + e')^FA,,i|i + 



6+V i=\ 



w 

1 



9A 



e,t 



Z^l 



where 9 = ./^^. Then JPf' < A^l and JP,l < A^*^,, for all t and 1< z < 

V nix — I V t — e,t y t\t — e,t\t — — 



Proof: The proof follows by combining the observations from ( [361 ), pTj ), ( |40| ). ■ 
Note that the recursion for Ag ^ above is very similar to that for At in Section |Bj So, the steady 

The desired 



state value of A^^j can be determined by a calculation similar to that in Lemma 
threshold for R is obtained by letting 5 — )► oo for a fixed e'. Since e' can be made arbitrarily 
small, we get the following bound on R 



A.3 



R>-log 
n 



/m. 



i=l 



Now, we need to determine the exponent needed to track ([3]) with a bounded mean squared 



error. In the absence of any measurements, it is easy to see from ( |18a[ ) that the growth of Pt is 
determined by the spectral radius of v^l + e'F. Since e' can be made arbitrarily small, inorder 
to track ([3]) with a bounded mean squared error, we need an anytime exponent n(3 > 2 log p{F). 
This completes the proof. 

E. The Limiting Case 

There are several bounds in the Mathematics literature on the roots of a polynomial in terms 
of the polynomial coefficients, a standard and near optimal bound being the Fujiwara's bound 
which we state below. 

Lemma A.7 (Fujiwara's Bound): Consider the monic polynomial with complex coefficients 
f{z) = + ciz^~^ + . . . + z„i and let p(/) denote the largest root in magnitude. Then 



p(/)<i^(/) = 2max |ci|,|c2 





1 


Cm 


m 


|Cl|, |C2|2, . . 


■ 5 I'-'m—l 1 J 










2 


i 



We will detail the proof for the case of scalar measurements. The extension to the vector 
measurements will then suggest itself. Let F is any m^-dimensional square matrix and f(z) 
denotes its characteristic polynomial. Then the following bounds hold (for details see [|35|) 



p{F) < p{F) < K{f) < 2p{F) (41) 

By the hypothesis of the Lemma, the eigen values of F„ are of the form {p^}^^. To emphasize 
the fact that F depends on n, we write it as F„ and as aj,„. Recall that the characteristic 
polynomial of F„ is given by f^iz) = + ai,„2;™--^ + . . . + a^^^n- Let X„ = {i | \pi\ > 1}, 
then the following is easy to prove 

lim , ^"''"^ , =0,i^ \Iu\, lim -logg = logs \f^i\ (42) 



i&Xu 



From ( |42l ), it is obvious that lim„^oo -Rn = X^iex^ ^^Ss The asymptotics of -Re,n> Rv,r 
and Rev,n can be similarly derived. Also, from ( [41] ), it is clear that lim„„i.oo ^ log p (i^n) = 
lim„_s.oo -logp(F„). The asymptotics of and (3y^n now follow immediately. 



